Skip to content

[pull] master from php:master#587

Merged
pull[bot] merged 1 commit intoturkdevops:masterfrom
php:master
Dec 12, 2025
Merged

[pull] master from php:master#587
pull[bot] merged 1 commit intoturkdevops:masterfrom
php:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Dec 12, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

GB18030-2022 is the current official standard, superseding the previous 2005 and 2000 versions. It is essential for modern Chinese text processing for the following reasons:

    1. Superset Relationship: GB18030 is a strict superset of CP936 (GBK) and EUC-CN (GB2312). Using GB18030 as the detection target covers all characters in these older encodings while enabling support for a much wider range of characters.
    2. Extended Character Coverage: The 2022 standard includes significant updates, covering over 87,000 characters. It adds support for CJK Extensions (C, D, E, F, G) and updates mappings for rare characters that were previously mapped to the Private Use Area (PUA) in the 2005 version. This is critical for correctly handling names containing rare characters (e.g., in banking or government data).
    3. Backward Compatibility: It is safe to promote GB18030-2022 as the preferred encoding. Files encoded in EUC-CN or CP936 are valid GB18030 streams.

This PR adds GB18030-2022 to the default encoding list for CN.
@pull pull bot locked and limited conversation to collaborators Dec 12, 2025
@pull pull bot added the ⤵️ pull label Dec 12, 2025
@pull pull bot merged commit 1f3fe93 into turkdevops:master Dec 12, 2025
1 of 2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant